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JMMARY 

Hie DNA sequence of the filamentous phage fl, consisting of 6407 nucleotides, has been determined. When 
•inpared with the DNA sequence of the related filamentous phage fd (Beck et ah, 1978), the fl sequence is 
ic nucleotide shorter and differs in 180 positions from the fd DNA. Only ten of these base exchanges cause 
lino acid exchanges in the known gene products. Most of the exchanges in fl are the same as in M 13 (Van 
ezenbeek et aL 1980), showing a near identity of these two phage (there are only 59 nucleotide differences). 
:eulatory units for replication transcription, and translation are in their essential parts identical in all three 
;age. 



TRODUCTION 

The genomes of the filamentous Escherichia coli 
;age, e.g. fd. fl ■ and Ml 3, consist of single-stranded 
cular DNAs of about 6400 nucleotides. These code 
r at least nine genes ; whose products are involved in 
age DNA replication, phage assembly and phage 
psid synthesis (Marvin and Hohn, 1969; Ray ; 
f 77). Since the propagation of the filamentous 
age is catalysed mainly by host functions, their 
nomes have btzn studied for many years as mode! 
stems of regulation in E. coli for replication. 



•breviaiions: bp. base pairs: IC, inierccnic region: pos.. 
siiion: pDNA. DNA protected by RNa polymerase acainst 
icreatic DNase digestion. Rh". replica:ive form. 



transcription, and translation (recent reviews Ray, 
1977; Schaller, 1979). In addition, the DNA of these 
phages was also used early as model system in the 
development of methods for the structural analysis of 
genomes. Ling (1972) sequenced a number of large 
pyrimidine oligonucleotides; Oertel and Schaller 
(1972) determined the sequence and the order of 
pyrimidine tracts in a pyrimidine rich segment of the 
fd DNA, and Sanger et al. (1973; 1974) deduced the 
sequence of 89 nucleotides in the fl DNA (pos. 
6321-6408) using the ribo-substitution technique. 
Three ribosome-binding sites were sequenced by 
Pieczenik et al. (1974) and the promoter site of gene 
X by Schaller et al. (1975) and Sugimoto et al. 
(1975). The DNA from the origin of replication, first 
isolated and characterized from a pre-initiation 
complex (Schaller et al.. 1976) was the first con- 
tinuous stretch of fd DNA to be analysed (Gray et aL, 
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197S). using the rapid methods for DNA sequencing 
newly developed by Sanger and Coulson (1975) and 
Maxam and Gilbert (1977). The DNA sequence of the 
origin of replication of defective interfering particles 
of fl was aJso anaJysed (Raveich et al. s 1979). 
Takanami et al. (1976) and Sugimoto et al. (1977) 
analysed the region of the genes Vll and VIII and the 
central terminator of transcription, sequencing RNA 
produced by in vitro transcription of restriction frag- 
ments. In 197S the total fd DNA sequence was deter- 
mined and published first in a preliminary version 
(SchaUer et ah, 197S) r followed by the final sequence 
in a short publication (Beck et al., 1978). At that 
time about 90?c of the DNA sequence of the related 
phage fl also had been analysed mainly to confirm 
the gene reading frames by identifying the numerous 
silent base exchanges between the fl and fd DNA. In 
this paper we discuss the experimental details of the 
ig and fl DNA sequence analysis and the derived 
structures of genes and regulatory signals. In addition, 
we present the completed f J DNA sequence. It differs 
by 180 base exchanges from the fd DNA sequence, 
only few of winch cause amine acid changes in the 
gene products. As the DNA of the other closely 
related filamentous phage MIS has also been 
sequenced completely (Van Wezenbeek et al.. 19S0). 
a comparison of the three sequences if. presented. 



materials and methods 

(a) Bacteriophage and enzymes 

The wild-type bacteriophage fl and the fl 
nonsense mutants arnRS, amR7, amR124 : and 
amR]43 were from N.D. Zinder ; New York. The 
bacteriophage fd was from H. Hoffmann-Berling, 
Heidelberg. The fd strain 478 which was isolated as a 
single plaque from the fd stock and used in the 
sequence analysis differs in at least one position 
(1859) from the fd phage from ATCC which was 
sequenced in part in the laboratory of M. Takanami.. 
Kyoto. The viral DNA was converted into the double- 
stranded form (RF j in vitro by oligonucleotide 
primed synthesis- as described (Gra\ ei a;. ; 2 97S). The 
restriction end on u closes HpaW. Hae)\\ . //mil. Hiw\. 
Heal. Alu\. and Taq\ were prepared esseniialiv as 
described by Robvm c; a), r ] 976 J. A cell. J/phl.hnc 



Mboll were purchased from New England Bioiabs. 
and Sau3A was a gift from H. Streeck, Munich. Poly- 
nucleotide kinase and calf intestinal' phosphatase 
were from Boehringer GmbH. Mannheim. [7- 32 P]- 
ATP (spec. act. approx. 6000 Ci/mmol) was prepared 
as described by Johnson and Walseth (1979). 

(b) 5'-End-labeling of DNA 

Restriction fragments were dephosphorylated 
either by adding phosphatase into the cleavage mix- 
ture together with the restriction endonuclease ; or in 
cases of flush-ended o: 3 '-extended ends in 50 mM 
Tris pH S at 60°C (0.02 units phosphatase per 20 pi 
assay; incubation time 30-60 min). The samples were 
phenol-extracted, desalted on a small Sephadex G75 
column (2 ml disposable pipette) in 10 mM 
ammonium-bicarbonaie pH 8.6 and lyophilised. This 
was found to be the best method for complete 
removal of the phosphatase. Phosphorylation with 
[>- 32 P]ATP and "polynucleotide kinase was carried 
out essentially as described by Maxam and Gilbert 
(1980). In genera] 1-2 pmol cleaved RF DNA were 
used per assay. 

(c) DNA sequencing methods 

Gel electrophoresis, eiution of DNA from polv- 
acrylamide gels, separation of labeled fragment ends 
either by a secondary restriction enzyme cleavage or 
by separation of denatured strands, and the base- 
specific chemical modification were performed essen- 
tially as described by Maxam and Gilbert (1980). The 
depurination was carried out in 669c formic acid for 
2-8 min "at 20°C.. followed by 3-fold dilution with 
water, three ether extractions, lyophilisation and 
hydrolysis in 1 M piperidine at 90 C C for 1 h in an 
oven. Some fragments analysed on long (1 m) 
sequencing gels (0.4 mm thick) could be read up to 
position 450. 



RESULTS AND DISCUSSION 
(a) Sequencing strategy 

Fo: the complete ar.aiysh of the DNAs of fc and 
fl ten diffcrcn; rest ri;-t ion enrionudeases w-rt used 




(see Fig. 1). Usually the DNA was cleaved with a par- 
ticular restriction enzyme and the resulting fragments 
were end-labeled as a mixture and separated on poly- 
acrylamide gels. Most of the radioactive fragments 
were used for the sequence analysis. Many restriction 
maps (e.g. from Hha\. Hinfi, Hph\, MboM, Sau3A, 
and Taql) were not established prior to sequencing 
but resulted from matching overlapping sequences. 

In the case of restriction endonucleases Hgal and 
Taqi, which each have only ten cleavage sites in the fd 
DNA, the restricted and end-labeled DNA was further 
cleaved by a second restriction enzyme before separa- 
tion on a gel. By comparing fragments present before 
and after the second cleavage it could be deduced 
which new fragments had been generated by the 
second digestion and which were thus labeled at only 
one end and which could be used directly in the sub- 
sequent analysis. Using this method separation of the 
re<ieaved fragments on a second gel was unnecessary. 
However, there was often a higher degree of con- 
tamination by neighbouring bands or background 
which interfered with extended reading of nucleotide 
sequences. 

The restriction enzyme Haelli also cleaves single- 
stranded DNA efficiently (Biakesley and Wells, 
1975). With this enzyme single-stranded fragments 
could be prepared directly and used for sequencing 
without secondary cleavage. 

Although it was usually possible to read the DNA 
sequences clearly, 85% of the fd DNA was sequenced 
in both strands to avoid mistakes that could occur at 
methylated bases (Ohmori et a!., 1978), in regions 
with a distinct secondary structure, or by incorrect 
reading or processing of the sequence information. 
Care was taken that all restriction sites used to 
generate fragments were read through from alter- 
native starts. This is particularly important in repeti- 
tive sequences that may contain closely spaced 
repeating restriction sites. Such an example occurs in 
the fd DNA sequence around position 2390, where a 
sequence of 18 nucleotides consisting of two small 
HpaW fragments was not included in the preliminary 
version of the fd DNA sequence (Schaller et al., 
1978). The nucleotide sequences were stored and 
processed, using computer programmes written in 
the computer language APL and established by 
Ostcrburg and Sommer (1981). 



(b) The DNA sequence 

The fd sequence was derived by reading serially 
overlapping fragments, making use of most of the dif- 
ferent restriction cuts and both DNA strands as 
shown in Fig. 1. Sequencing of fl DNA was started 
somewhat later. Therefore, only about 50% of the 
sequence analysis was carried out in both strands, 
since we could refer to the completed fd sequence. 
Nevertheless, some regions of fl were analysed in 
more detail than in fd, and the fl sequence was also 
used to confirm fd DNA sequences which had been 
determined in one strand only. 

Fig. 2 shows combined sequences of the fd and the 
il DNA. The continuous sequence corresponds to 
the fd DNA sequence as published in 1978 (Beck et 
al.). About 97% of the fl DNA is identical to the fd 
DNA. There exist 180 base changes, which are indi- 
cated above the fd DNA sequence. Whereas about 
150 of them lie within genes, only 10 actually cause 
amino acid changes. The others are "silent" altera- 
tions/i.e. ; they involve variable bases in the codons. 
This fact was used already earlier as indirect evidence 
for the correct reading frames of the genes in the fila- 
mentous phage genome (Schaller et al., 1978; and 
see below). Base changes present in the Ml 3 DNA 
sequence (Van Wezenbeek et al", 1980) are also 
included in Fig. 2. Many of them coincide with the 
changes in fl. demonstrating that fl and Ml 3 are 
more closely related to each other than to fd. 

A series of partial sequences from fd DNA and f ! 
DNA published earlier (see above) could be fitted 
into the complete sequences. All agree essentially 
with our data. Two changes had to be made in regula- 
tory regions: one at the promoter of gene VIII, where 
the sequence at the start of transcription is a G s run 
(not G 4 as in Takanami et al., 1976), the other at the 
central terminator, where the sequence at the end 
point of transcription is C : T 9 (not C : T 8 C as in Ling, 
1972, or C ? T 8 as in Sugimoto et al./ 1 977). An fl 
DNA sequence of the intergenic region (1C) between 
genes IV and II (position 5500-6000: Fie. 3) 
analysed by Ravetch et al. (1977; 1979) contains 
several deletions of one or two nucleotides when 
compared with the corresponding fd DNA sequence. 
None of these deletions could be confirmed in our fl 
DNA sequence. The corresponding region in MI 3, 
analysed first by Suggs and Ray (I97S) and con- 
firmed by the M13 DNA sequence of Van Wezenbeek 
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iL (1980) agrees with the fl sequence except for 
) positions. Whereas MI3 and fl are almost iden- 
J in this region, fd differs in 23 positions of the 
rrgenic region from these two phage. 
In addition to the published results, other fd DNA 
uence data for the region between position 300 
. 1600 were made available (M. Takanami, 
sonal communication). In this analysis a difference 
ween our fd DNA and that used. by Takanami was 
iced: aG-*A exchange at position 1859 creating 
additional Hinfl site in Takanami's DNA. The 
red restriction 'fragment pattern of this enzyme 
demonstrated experimentally (M. Takanami, 
ional communication). 

Restriction maps 

Vhen the work on fd sequence analysis started in 
7, several restriction maps (//pall, Haelll, Alul t 
I) had already been completed for fd, fl and/or. 
3. During the sequence analysis these maps were 
ned and maps for many other restriction enzymes 
.Wished. The maps of Taql, Hhal % Hinfl, Haelll, 
3A f BamHU Hphh Mboll, and >iccll (Vial) were 
:ked experimentally by comparison of the f rag- 
it length derived from the DNA sequence with the 
•esponding fragment patterns on polyacrylamide 
. The recognition sites in the three filamentous 
ge DNAs of the best known restriction enzymes 
listed in Table I. In nondenaturing polyacrylamide 

some fragments (e.g. in fd HpalhB (pos. 
2-3371), Hpall-H (pos. 5615-5996) and TaqhH 

5648-6041) migrate more slowly than other 
ments of comparable length. Such fragments 
Uly contain extended inverted repeats, which may 
;e secondary structures divergent from the normal 
ble helical form of DNA. 

Genes and gene products 

V genetic map of the eight known genes of the 
nentous phage was established by Lyons and 
ier (1972) and correlated later with the size of 
gene products determined on SDS gels (Model 
Zinder, 1974) and the physical maps (Vovis et 
1975). In addition to these approximate positions 
gene lengths the amino acid sequences of gene V 
gene VIII proteins (Nakashima and Koningsberg, 
4: Nakashima et a!., 1974) were determined, 



TABLE I 

Restriction endo nuclease recognition sites 

The sites for fd, fl and M13 t as found by computer analysis, 
are compiled (c.f. also Fuchs et al., 1980). Italic numbers 
represent cleavage sites that are experimentally proven. No 
cleavage sites exist in all three phages for the enzymes Avail, 
AvaUX, Bctl t Bgil. Bgdl, £coRI t HindlU, Kpnl, Mstl, Pstl, 
Pvul, Pvuli, Sad, Sacfl, Sail, Smal t Xhol, and Xbal. 
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a Site exists in fd only; b site exists in f 1 only; c site exists in 
M 13 only. 



40 



— IM 




— C4 M 


— IM tO 


«- C4 


£C tt 






£C CC CC 
















Of 
















• C C 


IK 


• r™ T 




• CD 3 




• K Ci_ 


r- i- 




O ^ 




J™" ^ 




CJ IT 


CJ Of 








CJ -J 




CD u 


r- ^ 




*s — ^ 




O LL 




CJ <Z 


C — ' 




LJ 










>— U 




C "1 




CD ^ 


* 


K — ' 


CD D> 




C X 




H- C 


* 


c ^ 


V— £ 




C _J x 




U u 




C «0 


c It 




C — X 




CJ Cu 






<t c 




Of * 




h~ £ 




CD C 


•CP* 




" J"" ~ - 






* 


• t— LL 


CD U 




r~ U 






X 


C Ul 


CJ c 




C X 






* 


CD <I 


CD ~ 




r- h- 




CJ cu 




f— C 


r— Qi 




^ *t 




^~ ^ 






-J 




CJ — * 








V— LL 


t— 




cd C 




h- C : 




f— S 


<r — 




— " 




CJ .' 




r> Of 


(J X 




— s 




<I — ' 




CJ w 


CJ LL 




C— D> X 




f— D 




k— C 


. ^ IT; 


X 


• C 3 X 




* CD D> 






cd c 




fc— (ti X 




CD *^ 






h- Hi 


* 






C >» 






h- — ' 








^ — ! x 




Of 


C £ 




V — 




<Z — * 




CJ — ' 


r™ — - 








h* Qi * 






«5 




J™ ^ 




— ■< 






12 D> 




w i- 




<I £ 




CJ lZ 


o £ 




LJ -C 




<Z — ' 




f_ — 


<I 








CJ CD 






• LJ CD 




• c — 




• CD - 






C tr 




w ^_ 




<I w 






C X 








CD CD- 




~. 


C -J 


X 






CD i-T 




CD CD 


r— c 








C X 




LJ "CU 


CJ — 








C _J 






CD C X 




rj & 




C £ 






CD* 




^- 1^1 




<I — ' 






t— — j * 




y— 




CJ C-r- 






C 




CJ- X 




^" 




r— isi 


• t— £ 












« CD 


C t* 




<E ~ 




r- V— 




C >^ 


<I c 




C — ' 


X 


t~ — 






C 3 




— — ' 




CJ 






C — 


X 




X 






Q; x 


cd cd 














r- £ 


*■ 


c c 


— 


— 






C *i 






LD 


C J— * 




C >• 


C C 




; — 1 




CD x 




i_ ^_ 


c o 




v~ ^~ 




f— Q- ! X 




^ r- 
















CJ U. 












CJ ^ 


p *C 




<I f— 




C >» 




CD u 






~" 




C — ! 






ID C 




■ 








^ i— 


CJ C- 








CJ — * 






CD !<- 




I— i- 




CD C 




■C — 


CJ <I 




— X 




CJ U 




^ CD ID 


P— C 




c 




CJ 1 






CJ — ' 




c i- 








|_ ; 


. ID C 




• LJ tf 




« c c- 






C 








CJ U 




<^ 


(J (L- 








CJ &_ 






(— to 




C — 




CD 




CJ ^ 






L2 O 




c >■ 






r- X 








^ — ! 


X 


CJ — 


t— U- 




— ' 




r— u 


X 


CD <X 


— ■ '— 








CJ & 




< CD — 


CD X 




t— £ 




H- to 






c *~ 








LJ 




CD CD 










" CD Oi 




. U &• 


I_3 — ' 




— ^ 




^ X 






IS <t 




<I — ' 




C = X 




!— LL 


*- a 




w — 




H &• X 




CJ CT" 


c *A 




CD i- 








CD !— 


cd <r 




U t : 




<I C 




CJ ^ 


*- C ! 


* 


h- It 




•C — ' 




r— — ' 










CJ CD 






<T ►-: 




CD — 




CD £ 




b D> 


<r o- 








«Z — " 




CD 3 


• CD i- 


* 










• & 


C <I 


X 


u j: 








CJ _= 


\- u 




<i t~ 








(— >* 


cd a; 


X 


r— 




C *-* 




— • 


<r to 


X 






CD E 




CD CD 


K Oi 


X 


i- K 




C — 




CJ u 


i ' 




<T I--, 




U CD 




cj a 






<r x 


X 


CJ ^= 




H to 


»- cj 






X 


c — 




r- »C 


U X 




h- U 


• X 


CJ X 






. <r t- 




- U X 




• CD £ 




• 3 <r 


r- u 




C r~ 




<I — 




f_ Qi 


LJ X 




<L C 




U CD 




*- JZ 


<r 




<L — 




<I = 




!- U. 


t- * 




U ID 




t- a* 




O <I = 


u — 




t- X 




CJ u 




c - 


cd c 




CD — 




o <I = 




CD CD 


cj £ 




CD- C- 




c — 


X 


CD- r 


<r m 








CD CD 


X 




<i <i 




<I 














G 




G 




O 

















— ro 

o: cc cc 



— r>i r:- 
a: a: tr 



— r-J n 
ce cc cc 



— rij r; 

Ct CC 



— r j ro 
cc cr 



<r m 
<r <x j 

CD — J 
QJ i 

^ CJ 0j 

o> <t ^ 
• CD « 

* tD >- 
x CD — • 

* CD CD 
CD r 
c — 

CD CD 

t- a- 
i- X 

!- C 

* ' <X T 

x CJ — 
X CD C 

<: *n 
c >- 
<I _J 



c = 
c — 

CD C 
r- U 
CJ t ! 
— K 
»- Oi 



CJ tv 



c = 



CD C 
I- f 



C ^ * 

CD 3 * 

t- X< X 

CJ «J 

CJ Li. 

<I fcl 

- CD C 

<T lr : 

•C V 
C 

K- >• 



^1 — 

CJ CD 

' <T ST 

CD u X 

<I C x 

(- £ X 

C ^ * 

<X C x 

h- i_ x 

<t ^ 

r— r- 

U LL 

- C t-n * 

c- <I * 

J— t- x 

CJ & 

i- to 



' t- Of 
V— X 



CD U 
CJ <I 
t- u 
. LJ CJ 
t- 

CJ «c 
CJ — • 
CD C 
<E lr : 
<I X 
O _J 

<r <c 
cj — 

. CD C 
f— t 
*- X 
«- LL 

" U i- 
U 0; 

t- t/: 

J- u 

CJ X 

C r- 

■ C »»i 

C _J 
CJ X 
CD — * 
CD CD 
r- U 
CJ 

r- ft 
CJ w 



C i- 



CJ 1Z 



cj ta 
i ■ 



CDC 

U £l 

<I i*i 

cd <r 

CD = 

i- l; 

k— ! 

c — ■ 

i— n; 

■ CD D> 

<Z ^ 

CD C 



LI' C 

i LL 

P C 

C X 

t- !- 

K &i 

r- — ' 



• CD CD 



CD CD 



cd r> x 

. <l = x 

i- t; x 

(- — ' 

<I t 

CJ — 

CD <I 

CJ £ : 

€ — 



CJ u 

CJ L- 
i- 

j— £ 

C ^ 



CJ Li. 

CD — ' 



CJ — ■ 











i- 




LL 








QJ 




. o u 






• r- 


n 


« <c 


f- »- 






r- 


— ' 


u 


. CJ LL 






r- 




t- 


C tr, 




X 


CD 


D> 


CD 


CD « 




X 


h- 




CD 


r- O 




X 


CD 




C 


CJ u 






CD 


CD 


<x 


CJ il- 






'r~ 


i- 


CD 


H CT 






CJ 


GJ 


t- CJ 


CD u 






»- 


to 


<z 


• u <r 






* t- 




• CD 


C £ 






CD 




1- CJ 


<I — 








<: 


1- 


CJ CD 










CJ 


u 






u 




1- 


CJ & 






<r 




CJ 


i- K 






r- 


u 


c 


t- CJ i- 






U 






LJ CL : 












(- K 






m 


"t= 


<r 


< H- Of 








X 


• CD 


r- X 






v- 


LL. 


C 


!- CL. 






<r 




c 


t- LL 






c 


w 




C »»! 








CD 


l! 


cd <r 


X 




O CD 


•5 


CD 


c — 


X 








t- 


k_ rr. 


X 




B 


C 


t— 


CD D> 






c 







C <T 



■- a; 



CD d> * 

CSX 

I- X 

t- — ' 

CD 0 
CJ i- 
U Cl 



< CD 



C 

c 



c 
c 
c 



to 



L_ C 
■— 

XX C 

X _! C 

x a; • p 
= c 



CD 

o 

CD 



i- X — ' 



c 

CD 



C x i- 



I- X 

C 



X 










c 


c 




r 


c 






X 










r— 






* UC 






# 


* CD 


z> 






c 
c 


c 




x ■ 

X 


c 












X 




t- 






X 










c 




X 




CD 


X 














w 


U. 


X 




k— 


CJ 






C 




X 




C 




X 












c 




X 






C 


X 




CJ 


X 










X 








X 




c 


r— 














c 


>- 






CJ 


u 






CD 
















u 


C 






c 


X 






« (- 








r- 








c 


X 






CD 








r- 










X 








CD 






»- 






*D [ 


VL 


<r 






CD 








CJ 






c 


CJ 








•C 




X 




»— 






v 1 


CD 


c 






CD 




X 




c 








CJ 


c 






O ^ 




X 




c 


c 


X 


M 1 


f— 








c 




X 








X 




c 


t— I 






c 


C 


X 




£ 




X 


ttl 


c 


Lr. 






< !- 




X 




c 




X 


c 


c 


>. 


X 




• t— 












X 




<r 




X 






r> 










* 








X 




p 






u 












5 














c 








b 








b 








c 








































CD 








lT 








CD 


D> 






CJ 


c 














C 








k— 










to 














CD 








c 








CJ 








■ CJ 


c 






c 


X 




X 








X 


f- 








c 






X 


c 




X 


X 


c 


>- 












X 


p 




X 


X 










CJ 












X 




t- p- 








CJ 


cZ 






c 




X 












r- 


0? 






c 


C 


X 






u. 


















X 




r- 


>- 






c 


•— : 






c 


X 
























f- 





























CD 
C 
• CD 
b— 

<r 
c 

CD 



CD 
C 

c 



c 

!~ 

CD 



e 

IT 1 . 



m 
o 



- CD 
c cd 



C H 
1-4 2> ■ 



x a 

b -4 

c — : 

x» c> 

a. ~i 

- X* 

re r> 



c X* • 

r> I> * 

\r, X> x 

cd o 

r - 

cs cd 

<- c: 



n 
x* o 

T> rs 
r x* 
x x> 

i." 3> 



r -i 

* X» CD 

'*< X* 

V —i 

x* rs 



— : — < ■ 
<< x* 
n 

X* Ci 



> x* 



* re -< 

* - X* 



c a 

— i 



-< * >— : 



-< HO 
>3 -i : 

re n 
-*. — t 

CD CS 

— > • 

c cd 

CD CD 
^ CS 
X -4 

a o 

— CD 

x n 

CD CD 

— CD 
*< ~i • 
-t > 

n 

— -t 
r- x* 
•< x* 

'•fl X* 

t> n 

— rs 

o -i 
-n £-> • 



























CD CD 
































Cl 




— i 














i> 




















5>- 


CD d 












— X» * 


> 


d ■ 




















CO CS 




«— i 








i— CD 




2* 








x 










—i 


CS 2 




^> 


X 






^ w.- 


CD 


CD 






2* 


*< rs 






* 


r> 




CS CD 


•< 








^> 


x — : • 




CD 
— 1 > 








— : 












rs rs 


«c 






IT; 




— , «j 












CD CD 




CD > 






rs 




C CD 




0 


— i o 














Ci CD 










i> 


CD 


n 


M 




in 




X — < 




cs 








CO Cl . 




—4 * 






Fi • 


— CD 










3> 



x — : 

X* CD 

tn X> 

■n n 

CD Cl * 

- X» 
C 2> 
-! > 

3- n 

~i —J 

ci rs 

- 3> 
3 CD 

n h 

x ci ■ 

-i h 

x x* 

- n 

Q CD 

- CD 

-< —i 

-i X* 

3- n 

- X* ' 



c 3> 
> X- 



re -i 



a* — i 



3> CD 
in X* ■ 



n 
> 

CD 



CD 
2> 



3> 



2 

* Hi 

* r" 

r 



— ■ tn 



8- 

3> 



q 
> 
n ' 

CD 

x> 

-! 

n 
x» 
> 

CD 



* > 

CD 
J> 
2> ■ 



3 ^jjr 

* — '. ■ 

>c -r CD 

* r I* 

lr« > / 

r 3> r 



* re 

X z. 
He C 

r 
re 

r 



SSg" 

— f < «c 



7- «F 



re 



^ -: * r* 



c 

0; 



— r- 

ri — : 
s» 

ci re 
n 



Q - 

x> 



CD 5 ! 

— ° 

■2 <^ 



> CD ■ 

— n 
— : r* 



3 

re 



r 

re 



rs 
er- 
rs 



a* » — i —i 



i— * 
c 

CD 



2> * 



7? X> 

u r j — 



7? 72 3C 

w r J — 



Z> 3? 3? 
W f J — 



77 73 7? 



73 7? 73 

w r-J — 



r 



3> 
a* 



7? 7? 7? 

w r-j — 



CD- 
CD 

Q 
3> • 

n 

CD 
3> 
-( 
O 

n 
n 

o 

3> ■ 
3> 
3> 
> 

a 
r> 

CD 
CD 

n 
n 



r 
re 

re 



to 
re 



re 



53 7T 77 

w r J — 













— ■ 

— ; 






CS 
— 


r.: 
i» 


— ; 
J> 






Z> 














CD 








d 












Z: 
















;1 


.-: 






»r 


•—I 


Z' 












?} 


















































— i 


















































CD 








— ! 












Z> 


























re 


CD 


















rs 










«✓ 










in 


— i 


— ^ 


























re 


















































re 




ci 






c 










cs 


ci 


-H 








CD 


— i 










3> 








x» 


3> 








-i 


— C 




* 


re 




CD 






i— « 


r* 


CD 










2> 






re 




1> 






:> 




3> 


«^ 








n 


n 




c- 




—H 


re 




CD 




r: 








CD- 




X 




CD 


CD 


— H 










n 


n> 




















73 


73 








U 













— *■ 

o 
















-* 








CD 




"v 








-1 




Ci 


X* 


C 




X* 








CD 








ri 


Ci 


■!>- 




CD 


— 


-H 












X* 










*~ 








*< 
















^ 


-n 








~i 




r* 


-n 


C 




—t 


"*I 


cs 




cs 


C 






CD 












r 




rs » 




re 




— : 








Ci 


re 






— i 


"i 


«< 




2* 


r~ 






x* 


re 








i> 






rs 








CD 
















C"i — * 








rs 


c 






2* 


Ci 










r 




ri 








— > 




z. 




CD 


re 


>t 




-! 




re 






r.= 


~; 




rs * 












^ 




— - "Z7 








— : 


<z 








Z. 






2* 


T> 


















— 




— i 










re 






C- : 




CS 












CD 


— 


cs 




— ■ i 
r-x 




— 




2* 








Ci 

— ; 








— i • 


CD 








— 


CD 








?— 






— i 


*< 














"i 


re 




rs 


~Ti 






— r> 


~ '•• 


r~ 




rs 


C?_ 


re 




—i 


c 


Z. 




— i » 








2* 


^" 


re 




* — i 


r* 


H- 


* CS 


re 


!— i 






i-~ 






C 


re 






& 


z> 


* 




»— 








r* 








*< 






h - 


■j\ 






cs 


i-i 






-i 




r 




O 


re 


re 






-i 


c 






~ 


X* 




rs 








cs 


r 






rs 


re 






rs 




re 




— i • 


> 


c 




r-\ 




c 




CD 




















-n 














CD 


0 




CD 








CS 


*< 








Ci 











If 



44 



— rj n 



3 * 

Q) * 
C 



QJ 



3 X 

a; * 







r-j 








CN 






tw ro 












cc 






















t 

>. 








*— 




, 








f- 


• CJ 




Of 










(J 






k~ 














12 




















<x 










<a 
















12 




^» 










t— 






h- 




u 










(j 






C5 






(_} 




.f 




H- 






<X 




P 


<E 




H- 




<r 






V- 




uv" 


H- 




D 




<r 


* 


C 


o 






H- 








i- 




3 


« H- 




CJ 


»- 








t- 




&' 


<I 




3 


C 








u 






t— 




CU 


u 




w 




»- 




lr- 


»- 




j 


C5 




-c 




13 










Si. 


<X 








CJ 






■C 


* 


lr, 


CJ 




a 




ts 




c 


15 


* 


<r 


r— 








<t 






r— 




tt : 


<r 




«c 
















U 








t- 






C 




i-< 


C2 




<r 










• f- 










Or 










C 






i- 














j— 






h- 




u. 




CJ 




Z 


<I 






«c 




X 




*- 










X 










t_} 






<r 






Z5 






X 


<Z 


* 




o 






c 




t 


X 


C2 


* 











— rw ro 
CC lC cc 



- n n 

Ct C£ 



— r-j 

CC £C \C 



c; C£ 



— rj n 

C£ £.* CC 



• C * « cj 

<E * <X U 

O * <+- r- 

t- * Qi K 

c r <r 

<C 3 h- 

<c * — • < 

O * C2 H 

t- * a- o 



c 



12 



CJ 

to 



2> 



c 

w 



C2 * C 
r- * £; 
J— X 



i— * i- 

■ <r v 

U i- 

t_) X 



J; 

f- H 

u- u ' 

<r >• 

i- x 

C2 — 

C2 CD 

C £ 

<x — 

« LJ t2 

C- ^ : 

<T X 

<: _J 

C2 C 

<z - 

<I - 

h- a- 



t2 



u 
u 

t2 



■ D 



c 
<I 



— i * 
3 * • 

ft! * 



CJ — 



ft! 



at 
io 



i— <— 

'<: "a« 

f~ — 

c tts 

. C * X 

C * _J 

X 



C2 





a 


<I 




X 






c 




X 






1- • 


a : 


X 






>- 










c 


In 






<r 


<: 


X 








- c 


1 


X 






C2 




X 






i— 




* 






<Z 


a: 






X 


IT 










•C 


X 








c 








X 


L2 


£ 






1— 


















X 






X 




X 


<L 








X 








X 




X 






X 




t— 


t- 


cu 


X 




T 


h- 










t- 


c 






c 


c 
c 
c 


M 

c 





z> * 

— X 
(— ft X 

12 

<E X 

. <r i.- 

x X 

C * _! 

I- *: — • 
»- T 

t2 I> 
C2 3 

<s — •• 

C C2 
12 O 
■ CJ 

u c 

C2 

C x X 

C * -J 



c 
u 
u 

CJ 

<I 



CJ 



CD 



12 
U 

Ci * 

<r x 

t~ X 

<r x 

<r * 



| s 

»-! X 

3 X V 

a= x c 



X « 

C2 



i- * 

ty x 
— * 



<x 



12 
C2 

C ' 



<r 
<: 



t2 r> 

<: x 

o 'I 

IJ c 

k— 



C x 

<r * 

r- X 



C C 

w »- C 

C2 L3 
»C C 
— • h- 

C t2 
a; -r C2 

to »- 

C2 

to p 



a: 



— i CJ 

x <r 

— ' ^_C2 

*5 Ur 



X ifj « 

X c c 

x c (J 

c — - c 

<r c 

Ifl <! C 

X > u 

£J C »- 
i- 

X c <z 



u C 

X C2 

h- <r 

— u 

x S<^ * 

t- h- * 

— -c 

t CJ 

z> u 



12 * f— 

<: 

x <i 



X 



x C 
x C 



<T 



c 



C2 



• C2 



C2 « / 



X 
-J 
!- 01 

Q; c 

LL 
b\ 



<r * 
h- * 

C2 
O 

■ <r 
<r 
<r 
c 

<t * 
C x 



O <I 



— — ' CJ 

- X^L2 * 

— 1 ' f— x 

O f _C2 

c a* £<J 



c 

tr, 
X 



(J 
C2 



2 6? 

IT: ^ L2 

X U 
_J *- 

c 







X 






C 


E2 






i— 








i— 




C2 


i2 




X 


f- CJ 




«z 




!- 


to 


f— 






X 




f- 




u. 








*— 




X. 




»— 




t- 


CJ 

<r 


z> 


i— 


0: 






i— 


Cl 


• <r 


u. 


<: 


fj 


c 


<I 


<r 


cr 


C2 






<E 






< C2 








' <L 










&• 






C 








r- 



o 
o 



o 
r.' 



c X c w 



G 



c 



lo 
X 



C 1 

a: 



x 

x C 



x u; 



C < »- 

= <: 



— ' vl 



£ - C2 

If: f— 



5. 



* p u 

£ < C2 

x X O 

x _J • C 

X — ■ -C 



LL iJ 

X I— 'J 

* c c 

x u C 



C2 
C 



- u 

CJ 

»- 

u- c 
ll <r 
c 

x »- c 

Ci. - CJ 

c- c 

X u * 

X c 

-_ r 

C2 b 
— oc 

C2 C 
C< f— 



a « c 

io -C 

Z Cl X 

x t- Cc ^ 

x C ^'r- x 

x C : 

X ^. !— 

x C ■— 

X' LL £2 

X L': <^ 

x C • C2 

X i_ OC 3T 

X d x 

r- r- * 

= C * 

— C x 

C2 t— X 



* if? <T 

x C U 

X i_ C2 

^ C2 



U • CJ 

ai C2 

to t- 

m i- 

x i- 

-j <r 

u C2 



x £ 

x C CJ 



* X U H- 



CJ* 

CJ 
CJ 
f— 
i— 

' H 



* — 

x C2 

* i. 



<L 



C2 CJ 
i- <I 



C r- 

1= C 



t C2 
— h— CJ 
C C2 



— « t2 



* C K 

* E I— 

* IT, C 

* C H 

* c o 1 - 

— ^ CJ 

<r 

* X t- 

* -J CJ 

* Zi t- 

X Ur- 

LL. C- 

C2 

x u >- 
t- • c 

- 12 

— t- 

c- ?— 

b2E 





r- 


CJ 






r— 






<Z 












■ 


X 




<Z 


* 


C 




*• 


LL 




X 




r— 






<T 


* 




<I 




ir 






<I 








•— 












O 















•llowing ,„ exact correlation of these genes with the 
micleot.de sequence. The start sites of two «nes were 
determined, or Jater on confirmed, respectively by 
sequence analysis of the terminal amino acids of 
he protons (gene III: Goldsmith and Koningsberg, 
1977. ene II: Meyer et al.. ,980). Start sites of three 
other (unknown) genes were determined by three 
nbosome^binding sites sequenced by Pieczenik et al 
(1974). Two of these sites w ere subsequently cor- 
related to genes V and VIII, the best characterized 
products of the filamentous phage. The third site 
defines the start of gene IV. 

The exact positions of the genes are shown in the 
final nucleotide sequence (Fig. 2). There is c,uv on- 
poss.b e reading frame for each gene within the "limits 
denved from the genetic map. The other reading 
frames contain many sto? codons. Continuous 
re.du.8 frames longer than 30 amino acids that are 
l° l , aU ': d Wilh kn °*" S^es are also indicated in 
MJ. In most cases translation is made unlikelv by 
the absence of a Shine-Dalgarno sequence except for 
two theoretical peptides of 65 and 42 amino*acids 
"jrung at positions 3417 and 4528. respectively 
jnere ,s no genetic evidence for additional sene s in 
the Uamentous phase. 
The arrangemen.'of the genes reflects an economi- 
usage of DNA. The only two noncoding reZ 
bceen genes V,„ and III and between g.nes A* and 
"contain the central terminator of transcription and 

Z f"*™. 0 ' DNA -P'--on. respectivelv'cene IX 
o edap S Wlth 0M nudeot . de a[ 

-e.ghbounng genes VII and VIII. Mos( other 
3« grated by one or two nucleotides from each 
other. These one or two nucleotides obviously funo 
on to change the reading frame for the following 
ne and to avoid the synthesis of fusion proteins in 
^ ofreadthrough by suppression. This principle is 

n cletT nmat : d ^ de ' e,i0n 0f 0,le of ^ 
a„ d ! S "I ' imergeniC Spacc be,we ™ VI 
DV A I" tll£ " 3nd M » DN * compared with the fd 
A (po, 3,95,. ,„ oolh cases , of 

C2J Tl 15 mainrained - ,n contras ' 10 *« 

over r Pha?M (0X ' 74 ' G4) " lhe " "° gene 

men „ °, m ° rC ' han °" e nucleo,idc in «"« fila- 
ous phage genome except of a short run of 20 

•oiinT , " e ' S ° bvi0USly 1,0 selcc,i °n Pr«sure 
tjQ m , ,| le genon , e , englh of (hese B% 

f hct erogeneous DNA into .he genome' of fd 
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hybrid phages could be constructed which were 

l0n88r tha " W " d "' ype " (Her^nam, 61 
Since mistakes can occur in establishing a DNA 
sequence for different reasons - as mentioned 
above-,, is necessary to have useful criteria to 
control the denved reading frame of the genes. The 
best controls are data from protein sequence analysis 
Amino acid sequences, partial or complete, were 
.vad.b e for the genes V. VIII, III. and II. Mother 
control tha, contributes greatly l0 the credibility of 
the complete sequence derived from the DNA. amino 
acd composition, was available for the »ene ,1' 
product. ° 

There are also several possible wavs to check the 
correctness of a reading frame a, the nucleotide level 
A > first simple possibility is based on the fact tha, 
about 50% of all triplets within the genes end with a 
I residue simuarly to the eicosahedral phage ©X174 
and G4 (Sanger et al.. 1977; Godson et al.. 1978) 
The filamentous phage mostly use codons with the 
highest : number of Fs for all arnino acids (s „ 
Table II). Although it cannot be used as an exac, 
proof for the correctness of a sequence in a specific 
shot, region, this phenomenon can be used ,o con- 
firm the reading frame over a lonser distant 
Secondly, the filamentous phage obviouslv do no, 
nave overlapping ger.es. There are in ,1,'e unused 
reao.ng frames stop codons every 30-40 nucleotides 
on the average. A third ,es, used exists in a com- 
panson of the DNa sequences of the closelv .eia.ed 
Phages fd and f, : we determined 280 base exchanaes 
I -0 of them within the genes. Onlv ten' of these 
result , n amino acid exchanges, the others are "silent" 
in the correct reading, i.e. they concern the variable 
bases m the codons. The fourth and most conclusive 
but also most elaborate method used is the deter- 
rn.nat.on of base exchanges to amber mutants For 
almost all genes the DNA sequence of one or several 
amber mutants was analysed (see Table IV). 

The genes are arranged in three functional e ,oups 
'n the genome: replication (genes f, and V) 'capsid 
S-cs IX, VII,. „,, and VI), and morphogene- 
ses I and IV). According , 0 this. ,l,e cc „e VII 
pro*,., (unknown function) could either be involved 
>n repletion or be par. of the virion from its posi- 
tion on the DNA. F 

The most significant functional a,d biochemical 
features of the gene products and the criteria for 
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TABLE II 
Codon usage in fd 



Phc 


TTT 


67 




TTC 


39 


Leu 


TTA 


65 




TTC 


32 




CTT 


49 




CTC 


17 




CTA 


6 




CTC 


26 


De 


ATT 


72 




ATC 


16 




ATA 


20 


Met 


ATG 


33 


Val 


GTT 


98 




CTC 


18 




CTA 


25 




CTG 


11 



Sei 


TCT 


92 




TCC 


33 




TCA 


35 




TCC 


9 


Pro 


CCT 


46 




CCC 


9 




CCA 


13 




CCC 


18 


Thi 


ACT 


60 




ACC 


23 




ACA 


15 




ACC 


n 


Ala 


CCT 


59 




GCC 


16 




GCA 


28 




GCC 


17 



Tvr 


TAT 

I fx J 


65 




TAC 


14 


ochre 


TAA 


5 


amber 


TAG 


1 


His 


CAT 


12 




CAC 


6 


Gin 


CAA 


35 




CAG 


43 


Asn 


AAT 


82 




AAC 


23 


Lys 


AAA 


73 




AAC 


34 


Asp 


GAT 


72 




GAC 


38 


Glu 


GAA 


40 




GAG 


31 



Cvs 


TGT 


16 




TGC 


8 


opal 


TGA 


3 


Trp 


TGG 


18 


Alg 


CGT 


32 




CGC 


16 




CCA 


5 




CGC 


1 


Sei 


ACT 


14 




AGC 


11 


Arg 


AGA 


11 




AGC 


5 


Gly 


GGT 


95 




CGC 


51 




CGA 


5 




CGC 


9 



localisation of the reading frames are summarized in 
Table III. Gene II shows two possible ATG start 
sites in positions 6007 and 6016. Based on a better 
Shine-Dalgarno sequence we have predicted that the 
former one must correspond to the protein start 
(Schaller et ah, 1978). This was confirmed by deter- 
mination of the N-terminus for 90% of the gene II 
product, using radiolabel Edman degradation (Meyer 
et ah, 1980). However, about 30% of the protein 
showed amino acids in positions that correspond to a 
start at the second ATG codon. Whether the two 
proteins which were co-isolated from a membrane 
fraction and which are not to be distinguished on 
SDS gels have different biological functions is not 
known. 

The existence of a ninth gene in the filamentous 
phage genome between the genes VII and VIII was 
predicted already from the preliminary fd sequence 
by Schaller et al. (1978). A gap of 94 nucleotides 
with no known coding or regulatory function shows 
a continuous reading frame, whereby the first and the 
last triplet each have an overlap of one nucleotide 
with the adjacent genes. The protein predicted from 
the sequence consists of 32 amino acids with a com- 
position (6 Ser, 2 Arg, no His) thai is similar to that 
of the C-protein, a minor capsid component which 
has been detected in highly purified fl and M13 
phage (Simons et al., 1979). There are no amber 



mutants known for the gene IX. This is explained 
by the DNA sequence, which shows that possible 
amber codons can only be created by transversions in 
positions 1223. 1249, and 1274. Hydroxylaminic 
treatment used to construct amber mutants of the 
filamentous phage (Lyons and Zinder, 1972) could 
induce only transitions. 

Gene III protein contains a remarkably high degree 
of glycine residues (16%)- Most of these are clustered 
in repetitive sequences: the sequence Glu-GJy-Gly- 
Gly-Ser appears three times around amino acid posi- 
tion 95 and four times around position 255, accom- 
panied by repetitions of Gly-Gly-Gly-Ser at both 
sites. In the DNA of an fd Tn5 derivative a stretch of 
30 nucleotides, corresponding to two of the Glu-Gly- 
Gly -Gly-Ser repeats (amino acids 253-262) is deleted 
(Auerswald, 1979). The deleted amino acids are 
obviously not essential for gene III function. Around 
amino acid 375 the protein seems to be variable too, 
since base exchanges in positions 2699, 2702, and 
2710 result in amino acid changes in fl and Ml 3 
relative to fd (Table V). 

M t estimations of the gene III protein differed 
between 55 000 and 68 000 depending on the SDS 
gel system used (Goldsmith and Koningsberg, 1977). 
Even the lowest value differs markedly from that 
derived from the DNA sequence (M r 42660). The 
unusual clustering of glycine residues may alter the 



binding of SDS and therefore the migration of the 
protein on gels. 

The reading frame of gene IV extends 20 nucleo- 
tides back into the 3' end of gene L Most of the 
nucleotide sequence of the ribosome binding site is 
homologous to the ribosome binding site of gene V. 
Sequence homologies are also recognizable in the first 
20 nucleotides of the two genes. Moreover, parts of 
this homologous sequence are repeated within the 
coding region of gene IV (positions 3901-3931 and 
4285-4305), in both cases centered around ATG 
codons, probably reflecting an evolutionary pathway. 
The reading frame of this gene ends with a TAC 
codon in position 5499. This stop codon lies directly 
at the beginning of the largest hairpin in the viral 
DNA. This structure may help to terminate transcrip- 
tion and/or translation, resulting in the correct length 
of the gene product even in UAG suppressor strains. 

(e) Regulatory signals 

Structures of regulatory signals concerning all 
three levels of phage development, replication, trans- 
cription and translation are recognizable on the DNA. 
The regulatory unit of replication lies in an intergenic 
region of 508 bp (= IG) between the end of gene IV 
and the start of gene II. This DNA segment can be 
folded into several large hairpin strucrures (Fig. 3). 
The existence of these structures in the viral DNA 
was demonstrated by their resistance to SI nuclease 
(Gray et al., 1978). Their significance is indicated by 
the fact that their sequence is conserved in all three 
filamentous phage. Although the IG is the most 
variable region in the filamentous phage genome - 
more than 5% of the bases of fd differ from fl and 
Ml 3 - none of these base changes lies in the stem of 
a hairpin but only in the regions between. Most of 
the region between the end of gene IV (pos. 5501) 
and the hairpin C does not seem to be necessary for 
propagation of the DNA. By cloning parts of the IG 
in plasmid pBR322 under conditions not permissive 
for ColEl -directed replication it has been shown that 
positions 5727-5868, containing the start site of the 
on* RNA (Geider et a]., 1978) and the nicking site of 
the gene II protein (Meyer et al., 1979), is sufficient 
-n the presence of helper virus for replication of the 
hybrid replicon (Cleary and Ray, 1980; Sommer, 
*98]). In a pseudo wild-type fd, a revertant from.a 
transposon<ontaining phage, a deletion of 64 nucleo- 
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tides (pos. 5553-5618) was observed which removed 
part of hairpin A and the pyrimidine-rich region 
between hairpins A and B (Auerswald, 1979). This 
region is obviously dispensable for phage multiplica- 
tion. A ColEl vector containing the left half of the 
IG from the end of gene IV to the Haelll fragment 
mentioned above (pos. 5489-5868) not only 
replicates under phage control but also packs single 
(plus)-strands of the vector into phage-coat protein 
efficiently (Sommer, 1981). The "packing origin" 
must therefore be localized in the left half of the IG; 
this indicates that hairpin A is involved in this func- 
tion, as speculated earlier (Schaller, 1979). 

In vitro transcription starts at several promoters 
which are located in front of almost each gene 
(except of genes VII and IX) and proceeds unidirec- 
tionally to a single Rho-independent stop signal 
immediately after gene VIII. In this way more RNA 
copies are produced of the genes proximal to the 
central terminator than of the more distal genes. This 
polar effect is amplified by the fact that the strongest 
promoters are located in the region preceding the 
termination signal. The products of the genes 
encoded by this region are the most abundantly 
needed proteins of the phage. In vivo this "cascade" 
model of transcription could be proved only in the 
region between the IG and the central terminator. 
Some of the RNA species of this region are obviously 
processed in the cell (Smits et al., 1980). In the other 
part of the genome there exist at least two additional 
Rho-dependent termination signcls which cease trans- 
cription behind genes VI and IV (Smits et al., 1980). 
In addition to the promoters known already 
(reviewed e.g. by Edens et al., 1976) H. Schaller 
(unpublished data) mapped some weaker start points 
of RNA synthesis. The mixture of all RNA-poly- 
merase-binding sites protected against pDNA was 
isolated, 5' end-labeled and used to prime repair syn- 
thesis on fd DNA single strands. The mixture of the 
extended promoter regions was then cleaved with 
several different restriction enzymes and separated 
on poiyacryiamide gels. By secondary cleavage with 
other restriction enzymes most of the resulting frag- 
ments could be positioned on the physical map. In 
addition, partial DNA sequences of nine from eleven 
binding sites isolated by this approach were deter- 
mined. .Ml these sites are listed in Fig. 4. The last 
nucleotide of each sequence in this figure corresponds 
to the first base of the (complementary) coding 
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-35 -10 +1 

AAf T ^TTGACA J FATAAT CAT 

x TCJTAATCTTTTTGATGCAATTCGCT TTGCTTCTGACTATAATAG ACAGGGTAAAGACCTGATTTTTGA 

5129 # # ~* 

ri ACAAMCAnAACGTTTACAATTTAA ATATTTGCTTATACAATCA TCCTGTTTTTGGGGCTTTTCTGATTA 

1149 

viii GATACAAATCTCC6TTGTACTTTGTT TCGCGCTTGG TATAAT CG CTGGGGGTCAAAGATGAGTG 
ii' TTTG^CTnGCCTACTCATTACTCCGGCATTGCATTTAAAA;[AT ATGAGGGTTCTAAAA 

4057 

iv tGATAAATTCACTA TTGAC TCTTCfCAGCGTCTTAATCTAAGCTAT CGCTATGTTTTCAAGGATT 

764 

v tTATTAACGTAGATTTTTCCTCCCAACGTCCTGACTGG TATAAT GA GCCAGTTCTTAAAATCGCA 

1494 

in tl AAGMAJTCACCTCGAAAGCAAGC TGATAAACCGATACAATTA AAGGCTCCTTTTGGA 

2710 

vi GGCGCTGGTAAACCATATGAATTTTCTATTGATTGTGACAAAATAA ACTTATTCCGTGGTGTCTTTGCGTTTC 

3079 

i TAATTCTCCCGTCTAATGC6CTTCCC TGTTTTTATGTTATTCTCT CTGTAAAGGCTGCTATTT 

2318 

i ' GGCA^TAGGCTaGGAAAGACGCTCGTTAGCGTTGGTAAGATTC AGGATAATTGTAGCTG 

iv' AATT^AACGTTCGCGCAMGGATTTAATAAGGGTTGTAG^TGTTTGTTAAATCTAATACA 

RNA polymerase protected 

Fig, 4. Nucleotide sequences of promoter sites in fd DNA. The sequences are aligned with respect to the known initiation nucleo- 
tides (-*) and the RNA polymerase recognition sites. The rightmost nucleotide in each line corresponds to the first protected base 
in the pDNA (minus) -strand determined by H. SchaUer, as described in the text. Homologies to the consensus sequences around 
positions -35 and -10 (top line), as compiled by Siebeniist et al. (1980), are underlined. The upper four (strong G-start) 
promoters have been ordered according to their relative strength (Seebuxg et al. t 1977). Base exchanges to fl and M13 DNA are 
marked by asterisks (see Fig. 2). 



strand protected in the RNA polymerase-promoter 
complex. 

The sequences show homologies around positions 
-10 and -35 to other promoter sites of E. coti RNA 
polymerase (reviews: Rosenberg and Court, 1979; 
Siebeniist et al., 1980). tn the weaker promoters 
(lower part of Fig. 4) this homology is less pro- 
nounced. 

In most cases promoters are integrated into the 
end of the preceding gene. The positions of the poly- 
merase binding sites in front of genes 11, X, V, and 
v *l (positions 5940-5980; 400-440; 790-830, and 
U70-121O, respectively) coincide with four strong 



C-start promoters determined by in vitro transcrip- 
tion of restriction fragments with RNA polymerase 
(Seeburg and SchaUer, 1975; Edens et al., 1976). The 
gene V promoter was positioned incorrectly in 
previous publications (SchaUer et al., 1978; Van 
Wezenbeek et al., 1980), since a provisional fd 
sequence was used in the initial interpretation of the 
mapping of the RNA polymerase binding sites 
mentioned above. The end of the polymerase pro- 
tected DNA was determined 105 nucleotides away 
from the next /Y/Vifl cleavage site, which is at position 
723 and not at position 741. The thus derived 
polymcrase-binding site shows a perfect Pribnow 
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TABLE IV 

Amber mutants of fd, fl and Ml 3 



TABLE V 

Amino acid exchanges in proteins betweed fd, fl and M13 



Gene 


Phage 


Name of 


Posi- 


Base 




mutant 


tion 


exchange 


11 


fl 


R124 


6349 


C. T 


V 


fd 


fdl22 8 


906 


C-T 




M13 


5H1*5H3» 


999 


C-T 






5H27 8 








fl 


R13 


999 


C-T 


Vil 


M13 


7H2b 


1114 


C-T 




M13 


7H3 b 


1141 


C-T 


vni 


M13 


8H1 c 


1373 


G-T 


in 


M13 


3H1 * 3H4 8 


2017 


C-T 




M13 


3H5 8 


2473 


C-T 


VI 


fl 


R5 R7 


3066 


C-T 


M13 


6H1 8 6H2 8 


3066 


C-T 






6H3 8 6H6 


a 




i 


M13 


1H7 8 


3263 


C-T 


IV 


fl 


R143 


5265 


C-T 


a Van 


Weienbeek 


et al. (1980); 


b Hulsebos 


and Schoen- 



'.makers (1978); c Boeke and Model (1979). 



Gene 


Amino 


Amino acid 




Pos. of 


acid 









exchange 




pos. 


fd 


fl 


M13 


in DNA 


II 


249 


Glu 


Glu 


Lys 


343 


274 


Arg 


Ser 


Ser 


420 


VIII 


35 


Asp 


Asp 


Asn 


1403 


III 


374 


Pro 


Leu 8 


Pro 


2699 " 


375 


Tyr 


Phe. 


Phe 


2702 




378 


Gly 


Arg 


Ser 


2710 


1 


142 


His 


Asn 


His 


3620 


164 


VaJ 


He 


lie 


3686 




326 


lie 


Leu 


Leu 


4172 


IV 


30 


Pro 


Ser 


Pro 


4308 


42 


Thr 


Thr 


Ser 


4344 




70 


Asn 


Asp 


Asn 


4428 




98 


Ser 


Asn 


Asn 


4513 




110 


He 


Asn 


Asn 


4549 




166 


Val 


Val 


He 


4716 



a This exchange was observed only in fl amber mutant R5. 



hexamer. A corrected assignment of this promoter in 
the fd DNA sequence is presented (Siebenlist et al., 
1980), but also has to be shifted in the -35 region by 
one nucleotide, since in Ml 3, as well as in fl, there is 
a T at position 783 instead of a C. An exchange of 
the corresponding C in position -32 for a T causes a 
down mutation in the Ap L promoter (sex\ mutant; 
Kleid et al., 1976). The alignment to the -35 region 
proposed here results in an equivalent homology 
pattern as that shown by Siebenlist et al. (1980), 
with the base exchange at a point of nonhomology. 
In the II' promoter, which is a strong RNA start site 
in vitro, position -33 is converted by a T- A base 
exchange in fl and Ml 3 (pos. 6213) into the "ideal" 
form. There is no mutant known in this position in 
other promoters, but the potentially different 
frequency of RNA initiation at this site in fd on the 
one hand and in f 1 and M 1 3 on the other hand is not 
yet measured. 

There are several other changes within the 
promoters of the filamentous phages, indicated in 
Fig. 3. All of them concern positions outside the 
polymerase recognition sites except for a C-+G 
exchange in the -35 region of promoter 1V\ It is 



questionable whether this promoter and the I' 
promoter have any function in vivo since these sites 
were only detected as polymerase binding sites in 
vitro and not by a transcription product. 

The position of the gene VIII promoter was also 
established by sequence analysis of the gene trans- 
cript, which starts with pppG 4 at position 1196 
(Takanami et al., 1976). Gene IX is encoded by the 
same mRNA. The ATG start codon lies 10 nucleo- 
tides downstream from the beginning of the mRNA. 
Transcripts probably do not start exclusively at one 
single position: like the "wobbling" start of gene X 
mRNA (Nusslein and Schaller, 1975), a percentage 
may initiate either one nucleotide before or one after, 
giving rise to varying numbers of G residues at the 5' 
end. Only the longest RNA chains starting with G5 
may offer an efficient ribosome-binding site, which 
perhaps accounts for the low expression of gene IX. 

Three RNA polymerase binding sites in front of 
genes VI, I, and IV (pos. 2740-2780, 3100-3140, 
and 4080-4120, respectively) confirm the positions 
of three A-start promoters, also determined by m 
vitro transcription of restriction fragments (Edens 
et al., 1976). A further binding site (pos. 
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1510-1550) overlaps partially with the central 
termination signal for transcription and defines the 
position of the gene III promoter. The mRNA 
probably starts at position 1544 with pppU (M. 
Takanaxni, personal communication; Edens et al. t 
1978). 

Sequences preceding the start codons of the 
genes listed in Fig. 5 show varying degrees of com- 
plementarity to the 3' end of the 16s rRNA (Shine 
and Dalgarno, 1974). Three ribosome-binding sites of 



phage fl were isolated as early as 1974 from ribo- 
some-RNA complexes, and the nucleotide sequence 
was analysed (Pieczenik et aL, 1974). In establishing 
the DNA sequence it appeared that these sites 
belonged to the genes V, VI, and VIII. 

The start codons for all genes are ATG except for 
gene III, which starts with GTC, possibly con- 
tributing to the low expression rate of this gene. In 
genes that are efficiently expressed (e.g. genes II, V, 
and VIII) an A follows the ATG codon, which is in 



16s rRNA 



3 *0H AUUCCUCCA CUAG— 



Gene II 



5991 



ATCAACCGGGGTACAT ATG ATT GAC ATG CTA 



Gene II' 



6COO 



GGTACATA ggffingg C ATG CTA GTT TTA CGA 



Gene X 



480 



ATTgggGGGG GATT CA ATG AAT ATT TAT GAC 



Gene V 



82 7 



C^fAJjG^fA^TTCAAA ATG ATT AAA GTT GAA 



Gene VII 



10?*> 



GTTCCGGC£^gE2)C ATG GAG CAG GTC GCG 



Gene IX 



1 190 

TCGCTGGGGGTCAAAG ATG AGT GTT TTA GTG 



Gene viii 



1285 



IXAffTGGAAACTTCCTC ATG AAA AAG TCT TTA 



Gene in 



1563 



TTTGGAGATTTTCAAC GTG AAA AAA TTA TTA 



Gene vi 



2840 



A gA£jGGAG TC nrAAft 'C ATG CCA GTT CTT TTG 



Sene I 



3181 

ATTGGGA 



AT ATG GCT GTT TAT TTT 



Gene iv 



4205 



AAAAAAG^TggTTCAA ATG AAA TTG TTA AAT 

■ F $h' 5 NuclC0tide ^"enc" of nbojome binding sites in fd DNA. Nucleotides complementary to the 3*-terminus of 16s rRNA 
we and Dalgarno, 1974) are underlined. Palindrome structures near the start codon axe indicated by arrowj, and stop signals 
^ceding the staxt codons are boxed. 
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agreement with the hypothesis that the fourth base 
in the f-Met-tRNA anticodon is involved in the 
formation of the translation -initiation complex 
(Taniguchi and Weissmann, 1978). 

In aU these ribosome binding sites a palindrome 
can be observed more or less evidently (indicated by 
arrows in Fig. 5), which allows part of the sequence 
upstream from the start cod on to base-pair with the 
sequence downstream, thus exposing the ATG 
triplet on lop of a small hairpin structure. Such struc- 
tures were first considered as possible translation 
recognition signals in other systems (Steitz and 
Jakes, 1975), but this idea was later rejected (Steitz, 

1979) . Similar structures are also recognized at other 
ribosome -bin ding sites (coat and A proteins of f2, 
MS2, Q0, genes C and F of 0X174, genes lad, galE, 
galTofE. coli) as listed in Steitz (3979). 

Stop codons immediately precede or overlap with 
translation^ start signals due. primarily, to the close 
packing of genes in the filamentous phage genome 
(see above). However, this arrangement may also 
provide a helper function for translation-promoted 
re-initiation of translation: The ribosomes stop in a 
position that allows Shine-Dalgarno base pairing to 
occur anew. 

The completed fl DNA sequence shows this phage 
to be closely related to the two other filamentous 
bacteriophage fd and Ml 3. The small gene products 
are almost all identical to their various counterparts, 
whereas the amino acid sequences of the larger pro- 
teins diverge from one another by as much as 2%. 
Regulatory elements also vary only slightly in their 
essential parts. More variable regions lie in the IG 
between highly conserved segments, the latter 
probably representing structurally functional domains. 
Such variable regions can, in part, be deleted or 
replaced by heterologous DNA,- which allowed the 
filamentous phage to be used as efficient cloning 
vehicles (Messing et al., 1977; Herrmann et al., 

1980) . 
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